Quantcast
Channel: RapidMiner Studio Forum topics
Viewing all 2122 articles
Browse latest View live

Filter operator does not update statistics

$
0
0

Hi,

 

if I filter the dataset then statistics for polynomial column of the dataset are not updated. The result is that if I plot the chart and axis is based on this column then axis is not auto ranged i.e. data points are only on the left side. It works with some sets and with some dont. The workaround is to filter everything, store it in excel and then load it again. But you can imagine that it is not ideal. In the picture you can see the statistics

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve AWAD8400_AWAD9700" width="90" x="45" y="544">
<parameter key="repository_entry" value="../../data/Full dataset/AWAD8400_AWAD9700"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="544">
<parameter key="parameter_expression" value=""/>
<parameter key="condition_class" value="custom_filters"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list">
<parameter key="filters_entry_key" value="Application version Minor.eq.6\.0"/>
<parameter key="filters_entry_key" value="Hour - start.ge.6"/>
<parameter key="filters_entry_key" value="Hour - start.lt.14"/>
<parameter key="filters_entry_key" value="Profile.is_in.ProfileA;ProfileB"/>
<parameter key="filters_entry_key" value="SetupId.is_in.INV0422\.014;INV0422\.015"/>
</list>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
</process>

 

 


Problem with generalized linear model (lambda seach)

$
0
0

Hi all,

I'm trying to do classification using generalized linear model.

In default setting, the lambda value is chosen by H2O (described in documentation).

However, I found that if I use lambda search, the performance is much better.

I don't understand what is the difference between this two method.

Is the better performance from doing lambda search comes from overfitting?

 

Thanks!

Best,

Scott

SPADE algorithm for sequential pattern mining, does RapidMiner has ?

$
0
0

Hi RapidMiner,

 

SPADE (Sequential Pattern Discovery using Equivalence classes) is another algorithm for sequential pattern mining besides GSP, and CM-SPADE is an improved version of SPADE by using co-occurence information. May i know whether RapidMiner has either of SPADE or CM-SPADE ? If not, will you plan to implement these 2, cos i've seen some articles said SPADE is faster than GSP for large datasets?

 

Thank you very much for this information.

 

P/S: below is the links to the papers of SPADE and CM-SPADE:

 

https://link.springer.com/content/pdf/10.1023/A:1007652502315.pdf

 

https://www.philippe-fournier-viger.com/spmf/PAKDD2014_sequential_pattern_mining_CM-SPADE_CM-SPAM.pdf

 

Best Regards,

Phi Vu

Neural Network With Multiple Outputs for Prediction

$
0
0

Hi,

I want a neural network with two outputs for prediction. I usually use the "set role" operator to set a lable, so the nerual network has one output.  Now, how to set to get two outputs? 

Thank you!

Generate a Random Number (New Number With Each Instance)

$
0
0

Hi there

 

I'm working on a process that creates two random numbers. The following ways have been tested already:

  • I used the "Generate Data" operator and tried to use both the attributes as well the label
  • I used the "Generate Attributes" rand()

I got to a positive result with each scenario. My result is showing me two numbers the way I want, alright. However, if I run the process a second time the numbers do not change (no random new number is generated) - the same numbers are the output of every new instance of the process.

What operator or work-around is to be used in order to get new random numbers for each time I run the process anew?

 

Many thanks in advance!

Roman

Looping through an example to generate more examples

$
0
0

My data is confidential I cannot post it. 

Here is an example of what my data looks like:

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

I would like to transform it into:

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

|empty|attribute1|attribute2|attribute3|attribute4|attribute5|attribute6|attribute7|attribute8|attribute9|attribute10|

 

Please help me do this.

 

Loop Files Operator runs forever

Sentiment Analysis for documents-Beginner needs assistance

$
0
0

Hello, I am very new to rapid miner and have searched for my question but can only find feedback regarding tweets. My problem is I have run my process (see below) and come up with question marks for the polarity and subjectivity and I'm not entirely sure why. I am analyzing the abstract of approximately 480 articles (approximately three-sentence to a paragraph long text documents). Below is the code and I am also attaching a screenshot of the results I am receiving. Any help will be greatly appreciated!

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85"><parameter key="excel_file" value="C:\Users\Dara\Desktop\dara\Digitization\Data.xlsx"/><parameter key="sheet_selection" value="sheet number"/><parameter key="sheet_number" value="1"/><parameter key="imported_cell_range" value="A1:J483"/><parameter key="encoding" value="SYSTEM"/><parameter key="first_row_as_names" value="false"/><list key="annotations"><parameter key="0" value="Name"/></list><parameter key="date_format" value=""/><parameter key="time_zone" value="SYSTEM"/><parameter key="locale" value="English (United States)"/><parameter key="read_all_values_as_polynominal" value="false"/><list key="data_set_meta_data_information"><parameter key="0" value="Number.true.integer.attribute"/><parameter key="1" value="Article.true.text.attribute"/><parameter key="2" value="Date.true.date_time.attribute"/><parameter key="3" value="Title.true.text.attribute"/><parameter key="4" value="Address.true.text.attribute"/><parameter key="5" value="Type.true.text.attribute"/><parameter key="6" value="Abstract.true.text.label"/><parameter key="7" value="Country.true.text.attribute"/><parameter key="8" value="Keywords.true.text.attribute"/><parameter key="9" value="Publication.true.text.attribute"/></list><parameter key="read_not_matching_values_as_missings" value="true"/><parameter key="datamanagement" value="double_array"/><parameter key="data_management" value="auto"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="112" y="238"><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="nominal"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="file_path"/><parameter key="block_type" value="single_value"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="single_value"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="com.aylien.textapi.rapidminer:aylien_sentiment" compatibility="0.2.000" expanded="true" height="68" name="Analyze Sentiment" width="90" x="380" y="238"><parameter key="connection" value="AylienDara"/><parameter key="input_attribute" value="Article"/><parameter key="sentiment_mode" value="document"/><parameter key="Is input URL" value="false"/></operator></process>

 


Can I and how do I use a correlation matrix for categorical variables?

$
0
0

Hello everyone,

I'm new to RapidMiner, so I apologise in advance for all the silly questions that I ask.

For a project that I am doing for uni, I have a dataset that contains both categorical and numerical variables. We are supposed to choose Predictors to predict our label "recommended" which is a binominal variable.

First of all, in addition to the >0.5 corrleation rule, can I choose my predictors based on the attribute weights in the AttributeWeight Table? How do I interpret this weight table? Why are the values contradicting with the correlation values?

Second, can I use categorical variables for my correlation matrix? If I can, how do I transform my categorical variables into dummy variables so that I can use them in the matrix? I know about the Nominal to Numerical Operator but I am not sure if that is the correct way to go because I am getting only negative correlations! (thats 14 attributes negatively correlated to Recommended) Is that normal?

 

Thanks a TON.

Comma values - best predictive model suggestion

$
0
0

Hello, I would like to know what is the best predictive operator to solve my problem.

 

input_output_comma.PNG

 

So my main goal is to find the best predictive operator which could handle comma values (Input 1, Input 2, Input 3) and values in "Input 4" column (YES/NO) and predict values in "Output" column. I already tried Optimize Parameters(Evolutionary) with operators like [SVM kernel types: (radial, anova, epachenikov etc), Random Forest, Neural Net, Decision Tree] to find optimal values, but the average accuracy of prediction was low.

 

I would really like to know if someone could suggest some predictive operator which could produce the best accuracy for this kind of problem.

NOTE: The values in columns "Input 1, Input 2, Input 3" after adding up always equals 100% in every row if this helps.

Thanks very much in advance.

Facebook saying RapidMiner is not a registered app

$
0
0

When requesting an o-auth token from Facebook using Facebook extension I get the followiong response: 

 

App not set up: This app is still in development mode, and you don't have access to it. Switch to a registered test user or ask an app admin for permissions.

how to apply smote upsampling

$
0
0

hello.. sorry i am new in data mining i have project on classification loan default and my data is imbalanced  ..

where i apply smote upsampling before spilt the data or after? my data is not larg only 1030 sample

Nominal to Date

$
0
0

Dear Rapidminer Community

 

I am struggling again with some Nominal to Date conversions and I hope you can help me. I am extracting with an "extract macro" a date value and I am getting a nominal value. Now I want to convert this nominal value from the macro again back to a date value.

 

The nominal format is:

May 22, 2016

 

How can I convert this into date format? All I get is: "Cannot parse date" and I have tried already plenty of options.

 

Any help would be greatly appreciated!

 

Best regards

Felix

about rapidminer studio.

$
0
0

what are the classifier or algorithm used at the backend of rapidminer for preprocessing the data.

Which algorithm is the operator "neural net" trained with ?

$
0
0

As is shown in the description, the operator"deep learning" is based on a multi-layer feed-forward artificial neural network that is trained with stochastic gradient descent using back-propagation.

 

I use the operator "neural net", and i want to train the net with some algorithms, such as Stochastic Gradient Descent, Levenberg-Marquardt and Bayesian Regularization. How to do?

 

Does  the operator "neural net" only use back-propagation without any optimization method? 


Where is the predicted result of training set using neural net?

$
0
0

I use the operator "Neural Net" to make a prediction. I split the ExampleSet into training set and testing set . I can easily find the prediction of testing set , however,  where is the prediction of training set, i also want the result of training.

Thank you!

Jeffery

[Example] Sentiment Analysis

$
0
0

Hello All, 

 

Enjoy this data set and model example of sentiment analysis from our team of data scientists here at RapidMiner. 

 

Cheers! 

 

Handle error message "Too few examples"

$
0
0

Dear Rapidminer Community, 

 

I have divided my process in several sub-processes and implemented in those processes are several filters. Those filters filter certain values in a certain time frame (e.g. 01.01.2015 - 20.04.2015). Sometimes the filters do not deliver any value, which triggers the error message "Too few examples". As this information is not important for me, I would like to "catch" this error message and the process should jump to the next sub-process(!) without any message/interruption. Is this possible to do? I would like to catch all of these error messages and simply let the process run until the end. 

 

I hope it is clear what I want to do Smiley Wink

 

Best regards

Felix

Process Documents from Files: Include all subdirectories

$
0
0

Hi all,

 

In the parameters box of "Process Documents from Files" I can set the "text directories". 

I have a lot of html files on local hard drive, in a folder called "webpage", with many subfolders (~100). Too many subfolders to add them all separately. I am missing a checkbox that enables to include all the subfolders. Is there a way I can achieve this? I have created a CSV file with columns "class name" and "directory". Would be great if i could import this.

 

Cheers, Roger

Mutation prob of the operator "Optimize Parameters (Evolutionary)"

$
0
0

Hi,

when i use the operator "Optimize Parameters (Evolutionary)", the parameters include "mutation type", "selection type", "crossover prob".........

However, where can i set the value of "mutation prob"?  Or can you tell me the default value is ?????

Viewing all 2122 articles
Browse latest View live